Skip to content

Conversation

@cubewise-gng
Copy link
Contributor

To address #1307 write_dataframe_async to aggregate dataframe before parallel processing.

@MariusWirtz
Copy link
Collaborator

Would it be easier if you call the existing aggregate_duplicate_intersections function directly from the write_dataframe_async function?

@cubewise-gng
Copy link
Contributor Author

In case the dataframe contains a mix of string and numeric values.
I did not want to aggregate strings by calling aggregate_duplicate_intersections directly.
Is that the right approach?

@MariusWirtz
Copy link
Collaborator

In case the dataframe contains a mix of string and numeric values.
I did not want to aggregate strings by calling aggregate_duplicate_intersections directly.
Is that the right approach?

Yes. Good catch. If increment is True, we should increment the numeric values and apply last-one-wins on string values.

Changing the increment default value to True is technically a breaking change, but I would argue that this counts as a bugfix.
With increment set to False (the current behaviour), the function produces nondeterministic results when handling duplicate numeric records.

I will run all tests and merge

cellset = CaseAndSpaceInsensitiveTuplesDict(
dict(zip(df.iloc[:, :-1].itertuples(index=False, name=None), df.iloc[:, -1].values))
)
return cellset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return a dataframe.

Currently, it returns a dictionary and that breaks existing code and existing test cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, made the correction.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2025

Tests completed for environment: tm1-11-cloud. Check artifacts for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants